Parallel Actors and Learners

1 Overview

C. Zhang et al.1 proposed a framework for generating scalable RL which consists of parallel actors and parallel learners.

One of the major differences is implementing Segment Tree for PER with K-ary tree instead of ordinary binary tree. According to the authors, K-ary tree reduces the depth of intermediate levels, which speeds up priority update during insertion. Moreover, K-ary tree reduces total memory size, and reduces the number of cache misses. (To be honest, 1 million size segment tree can be loaded on L2 cache of their CPU, and it is not so critical as the authors had considered.)

In order to allow parallel read and write transitions, the authors proposed “lazy writing”, which first sets priorities to 0 to avoid sampling, then updates transition data, and finally sets true priorities. This approach achieves lockless parallelism for transition data (not for the segment tree).

The authors divide segment tree lock into global lock and last level lock in order to read priorities during intermediate update.

Additionally, their program can decide the number of actors and learners from the hardware configuration and the desired throughput.

2 Comparison with cpprb

As far as we know, they don’t publish their source code. Once they disclose it, we would like to compare it with our implementation.

We try to summarize similarities and differences between their framework and our MPPrioritizedReplayBuffer.

The Paper’s Framework cpprb.MPPrioritizedReplayBuffer
Multiple Actors Yes Yes
Multiple Learners Yes. Gradients are summarized at parameter server. No
Segment Tree K-ary Tree on 1-d Contiguous Array. Propagate only diff value2. Binary Tree on 1-d Contiguous Array. Propagate value itself.
Priority Update Write the last level and update intermediate levels with lock. Write the last level parallelly. Update intermediate levels with lock just before sample to avoid inconsistency.
Lock Segment Tree Global and Last Level Locks. (and Atomic Index?) Atomic Index, Transition Data Lock and Segment Tree Lock. (See Ape-X)
Lock Architecture (Maybe) Ordinary Exclusive Lock Parallel writing is allowed (because writing at different address) and the number of writers in the critical section is atomically traced. Reading has higher priority and prevents writers from entering the critical section again. Reader starts working after all writers exit from the critical section.
Inconsistency of Sampling and Priority Update Allow weight inconsistence because of little impact in practice Atomically extract weights, too. (Always consistent)
Implementation Wrtten in C++. There is Python binding to plugin into RLlib3 Written in C++ (Segment Tree) and Cython (Buffer). Provided as Python class.

  1. C. Zang et al., “Parallel Actors and Learners: A Framework for Generating Scalable RL Implementations”, HiPC (2021) (arXiv↩︎

  2. It is not clear how they manage min tree. Do they use only sum tree? ↩︎

  3. E. Liang et al., “RLlib: Abstractions for Distributed Reinforcement Learning”, ICML (2018), (arXiv, code↩︎